Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve char.IsWhiteSpace for non-ASCII #111569

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MihaZupan
Copy link
Member

char.IsWhiteSpace looks something like

bool IsWhiteSpace(char c)
{
    if (c < 256) return Latin1LookupTable[c]:

    return Fallback(c);
}

static bool Fallback(char ch)
{
    nuint offset = GetCategoryCasingTableOffsetNoBoundsChecks(ch);
    return (sbyte)Unsafe.AddByteOffset(ref MemoryMarshal.GetReference(CategoriesValues), offset) < 0;
}

If you just run a microbenchmark over this with non-ASCII input, things will be fine as PGO will inline the whole thing together.

But if IsWhiteSpace ends up tiering up without seeing data super skewed for non-ASCII, the GetCategoryCasingTableOffsetNoBoundsChecks call in the fallback case won't be inlined, slowing down any future queries for non-ASCII chars.

public class NonAsciiTest
{
    private readonly string _text = string.Create(50, 0, (buffer, _) => new Random(42).NextBytes(MemoryMarshal.Cast<char, byte>(buffer)));

    public NonAsciiTest()
    {
        for (int i = 0; i < 20; i++)
        {
            for (int j = 0; j < 150; j++)
            {
                _ = char.IsWhiteSpace((char)j);
            }

            Thread.Sleep(16);
        }
    }

    [Benchmark]
    public int CountWhiteSpace()
    {
        int count = 0;
        foreach (char c in _text)
        {
            if (char.IsWhiteSpace(c))
            {
                count++;
            }
        }
        return count;
    }
}
Method Toolchain Mean Error Ratio
CountWhiteSpace main 152.94 ns 3.041 ns 1.00
CountWhiteSpace pr 95.31 ns 0.439 ns 0.62

@MihaZupan MihaZupan added this to the 10.0.0 milestone Jan 18, 2025
@MihaZupan MihaZupan self-assigned this Jan 18, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

@stephentoub
Copy link
Member

@AndyAyersMS, @EgorBo, is this expected?

@EgorBo
Copy link
Member

EgorBo commented Jan 18, 2025

@AndyAyersMS, @EgorBo, is this expected?

The fact that PGO may slow things if they change behavior after Tier1-PGO was reached? Yes. Unfortunately, we're quite far away from being able to deoptimize/recollect PGO for such cases.

Although, it is indeed possible that inliner is too strict for blocks which currently have 0 weights, but if we relax it - we'll have to inline less in hot ones to maintain the balance

@stephentoub
Copy link
Member

it is indeed possible that inliner is too strict for blocks which currently have 0 weights,

This

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants